{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 反向传播(Backpropagation)笔记\n",
    "\n",
    "反向传播是深度学习的基石。\n",
    "\n",
    "## 导数\n",
    "\n",
    "先回顾下导数：\n",
    "\n",
    "$$\\frac{df(x)}{dx}=\\lim_{h->0}\\frac{f(x+h)-f(x)}{h}$$\n",
    "\n",
    "函数在每个变量的导数就是偏导数。\n",
    "\n",
    "对于函数$f(x,y)=x+y$，$\\frac{\\partial f}{\\partial x}=1$，同时，$\\frac{\\partial f}{\\partial y}=1$\n",
    "\n",
    "梯度就是偏导数组成的矢量。上述例子中，$\\Delta f=[\\frac{\\partial f}{\\partial x},\\frac{\\partial f}{\\partial y}]$。\n",
    "\n",
    "\n",
    "## 链式法则\n",
    "\n",
    "对于简单函数，我们可以根据公式直接计算出其导数。但是对于复杂的函数，我们就没那么容易直接写出导数。但是我们有**链式法则(chain rule)**。\n",
    "\n",
    "定义不多说，咱们举个例子，感受一下链式法则的魅力。\n",
    "\n",
    "我们熟悉的sigmoid函数$\\sigma(x)=\\frac{1}{1+e^{-x}}$，如果你记不住它的导数，我们怎么求解呢？\n",
    "\n",
    "求解步骤如下：\n",
    "\n",
    "* 将函数模块化，分成多个基本的部分，对于每一个部分都可以使用简单的求导法则进行求导\n",
    "* 使用链式法则，将这些导数链接起来，计算出最终的导数\n",
    "\n",
    "具体如下：\n",
    "\n",
    "\n",
    "* 令$a=x$，则 $\\frac{\\partial a}{\\partial x}=1$\n",
    "* 令$b=-a$，则 $\\frac{\\partial b}{\\partial a}=-1$\n",
    "* 令$c=e^{b}$，则 $\\frac{\\partial c}{\\partial b}=e^{b}$\n",
    "* 令$d=1+c$，则 $\\frac{\\partial d}{\\partial c}=1$\n",
    "* 令$e=\\frac{1}{d}$，则 $\\frac{\\partial e}{\\partial d}=\\frac{-1}{d^2}$\n",
    "\n",
    "上面的e实际上就是我们的$\\sigma(x)$，那么根据链式法则，有：\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\frac{\\partial \\sigma(x)}{\\partial x} &=\\frac{\\partial e}{\\partial x} \\\\\n",
    "                                                       &=\\frac{\\partial e}{\\partial d}\\cdot\\frac{\\partial d}{\\partial c}\\cdot\\frac{\\partial c}{\\partial b}\\cdot\\frac{\\partial b}{\\partial a}\\cdot\\frac{\\partial a}{\\partial x} \\\\\n",
    "                                                       &=\\frac{-1}{d^2}\\cdot1\\cdot e^{b}\\cdot-1\\cdot1 \\\\\n",
    "                                                       &=\\frac{e^b}{d^{2}} \\\\\n",
    "                                                       &=\\frac{e^{-x}}{(1+e^{-x})^2} \\\\\n",
    "                                                       &=(1-\\sigma(x))\\cdot\\sigma(x)\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "sigmoid函数的导数可以直接用自身表示，这也是很奇妙的性质了。这样的求导过程是不是很简单？\n",
    "\n",
    "## 反向传播代码实现\n",
    "\n",
    "求导和链式法则我都会了，那么具体的前向传播和反向传播的代码是怎么样的呢？\n",
    "\n",
    "这次我们使用一个更复杂一点点的例子：\n",
    "\n",
    "$$f(x,y)=\\frac{x+\\sigma(x)}{\\sigma(x)+(x+y)^2}$$\n",
    "\n",
    "我们先看下它地forward pass代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "\n",
    "x = 3\n",
    "y = -4\n",
    "\n",
    "sigy = 1.0 / (1 + math.exp(-y)) # sigmoid function\n",
    "num = x + sigy # 分子\n",
    "sigx = 1.0 / (1 + math.exp(-x))\n",
    "xpy = x + y\n",
    "xpy_sqr = xpy**2\n",
    "den = sigx + xpy_sqr # 分母\n",
    "invden = 1.0 / den\n",
    "f = num * invden # 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述过程很简单对不对，就是把复杂的函数拆解成一个一个简单函数。\n",
    "\n",
    "我们看看接下来的反向传播过程："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dnum = invden"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为$f = num * invden$，所以有$\\frac{\\partial f}{\\partial num} = invden$，也就是$dnum = invden$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dinvden = num # 同理\n",
    "\n",
    "dden = (-1.0 / (den**2)) * dinvden # 链式法则"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "展开来说：$\\frac{\\partial invden}{\\partial den}=\\frac{-1}{den^2}$，又$\\frac{\\partial f}{\\partial invden}=num$，所以$dden=\\frac{\\partial f}{\\partial den}=\\frac{partial f}{\\partial invden}\\cdot \\frac{\\partial invden}{\\partial den} = \\frac{-1.0}{den^2}\\cdot dinvden$\n",
    "\n",
    "所以，同理，我们可以写出所有的导数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dsigx = (1) * dden \n",
    "dxpy_sqr = (1) * dden\n",
    "\n",
    "dxpy = (2 * xpy) * dxpy_sqr\n",
    "\n",
    "# backprob xpy = x + y\n",
    "dx = (1) * dxpy\n",
    "dy = (1) * dxpy\n",
    "\n",
    "# 这里开始，请注意使用的是\"+=\"，而不是\"=”\n",
    "dx += ((1 - sigx) * sigx) * dsigx # dsigma(x) = (1 - sigma(x))*sigma(x)\n",
    "dx += (1) * dnum\n",
    "\n",
    "# backprob num = x + sigy\n",
    "dsigy = (1) * dnum\n",
    "# 注意“+=”\n",
    "dy += ((1 - sigy) * sigy) * dsigy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "问题：\n",
    "\n",
    "* 上面计算过程中，为什么要用“+=”替代“=”呢？\n",
    "\n",
    "如果变量x，y在前向传播的表达式中出现多次，那么进行反向传播的时候就要非常小心，使用+=而不是=来累计这些变量的梯度（不然就会造成覆写）。这是遵循了在微积分中的多元链式法则，该法则指出如果变量在线路中分支走向不同的部分，那么梯度在回传的时候，就应该进行累加。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
